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Studies have found that, though many students have 
generally positive attitudes toward science, tech¬ 
nology, engineering, and math (STEM), their atti¬ 
tudes toward school science are "mixed" (Sjoberg 
& Schreiner, 2006). Students' initial interest in sci¬ 
ence often dwindles because of the way science is 
taught in school (Krajcik, Czerniak, & Berger, 2003). 

By contrast, out-of-school time (OST) programs are inter¬ 
mediary spaces that connect opportunities across a range 
of contexts (Noam, Biancarosa, & Dechausay, 2003). STEM 
experiences in OST can cultivate and multiply students’ ini¬ 
tial interest in science, helping students to stay motivated and 
engaged to learn STEM in school. Afterschool and summer 
settings are being identified as environments for engaging 
youth in STEM and building their interest in pursuing future 
STEM careers (Coalition for Science After School, 2004). 

Growing evidence shows that participation in OST 
activities positively supports youth development in gen¬ 
eral (Hall, Yohalem, Tolman, & Wilson, 2003; Vandel, 
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Reisner, & Pierce, 2007) and STEM learning in particular 
(Tamir, 1991). However, simply participating in a self- 
identified STEM program is not sufficient. Youth will ben¬ 
efit more if they participate in quality afterschool programs 
(Mahoney, Levine, & Hinga, 2010). In fact, participation 
in a low-quality program can negatively affect youth devel¬ 
opment (National Institute on Out-of-School Time, 2009). 
Therefore, a common understanding of quality indicators 
in STEM OST is vital not only for researchers and evalu¬ 
ators but also for afterschool program leaders and staff. 

An important way of knowing whether programs are 
of high or low quality is to observe them systematically and 
reliably. Such observation is practically impossible without 
good definitions of what constitutes quality Observation 
tools employing such definitions and related indicators 
are being developed and applied both in schools (Bill and 
Melinda Gates Foundation, 2012) and in OST programs 
(Gitomer, 2012). Reputable observation tools for assess¬ 
ing STEM instruction in school set¬ 
tings include the Reformed Teach¬ 
ing Observation Protocol (Piburn et 
al., 2000) and the Classroom Ob¬ 
servation Protocol (Weiss, Pasley, 

Smith, Banilower, & Heck, 2003). 

The OST field has several observa¬ 
tion tools, as described by Yohalem 
and Wilson-Ahlstrom (2009), for 
assessing program quality generally 
However, “instruments designed 
specifically for observing informal 
settings in science are only now be¬ 
ing designed and researched” (Gi¬ 
tomer, 2012, p. 2). 

In order to address this gap, 
researchers at the Program in Edu¬ 
cation, Afterschool, and Resiliency 
(PEAR) created the Dimensions of 
Success (DoS) assessment tool to 
help OST programs and researchers monitor and mea¬ 
sure quality. The DoS tool allows observers to collect sys¬ 
tematic data along 12 quality indicators to pinpoint the 
strengths and weaknesses of afterschool science learning 
experiences. These data can then be used to guide techni¬ 
cal assistance and professional development and to help 
programs choose and modify curricula to meet students’ 
needs (Noam & Shah, in press). The previous work on af¬ 
terschool quality assessment, especially the research done 
by Yohalem and colleagues (2009), along with the existing 
national frameworks of STEM assessment, guided the de¬ 
velopment of the DoS tool. 


DoS is taking the lead in establishing definitions of and 
indicators for STEM program quality This paper describes the 
development of the DoS tool, outlines its structure and the 
professional development that enables its use, and presents a 
case study of its application in an urban OST program offering 
STEM activities. Use of DoS is facilitating program improve¬ 
ment in OST programs and networks across the country 

Development of the DoS Observation Tool 

In 2006, the research team at PEAR was invited to evaluate 
the effectiveness of the Summer METS (math, engineering, 
technology, and science) Initiative, which was established 
by the Kauffman Foundation to expand opportunities for 
student participation in science and technology-related 
summer activities and to better assist underserved youth 
in metropolitan Kansas City. In 2007, in addition to sur¬ 
veying 450 Summer METS students and 64 teachers, ob¬ 
servers recorded notes using the first pilot version of the 
DoS tool (Noam, Schwartz, Bevan, 
& Larson, 2007). Based on these 
observation data, the tool was fur¬ 
ther developed in 2008, when 10 
programs in Kansas City began us¬ 
ing DoS to observe one another in 
a peer-to-peer evaluation network 
(Dahlgren, Larson, & Noam, 2008). 
Though the programs were all 
STEM-focused, they were diverse 
in many ways. For example, they 
used different curricula and served 
different student populations; they 
worked in a variety of configura¬ 
tions, whether school-based or 
community-based, free-standing or 
part of a bigger network. Therefore, 
researchers’ biggest challenge was to 
standardize DoS to be applied in a 
wide variety of programs while still 
using the same rubrics so that the results could be com¬ 
pared across sites (Dahlgren et al., 2008). 

After incorporating feedback from the Summer METS 
project, developers worked to expand the usability of the 
DoS tool and to pilot it in a wider sample of afterschool 
programs, starting with the Informal Learning of Science 
Afterschool (ILSA) project. As part of ILSA’s in-depth 
case studies, trained observers used DoS in eight after¬ 
school sites in California and Massachusetts, conducting 
115 observations from January 2008 to August 2010. To 
triangulate DoS with previously validated observation tools, 
researchers also collected data on these programs using the 


The DoS tool allows 
observers to collect 
systematic data along 
12 quality indicators to 
pinpoint the strengths and 
weaknesses of afterschool 
science learning 
experiences. These data 
can then be used to guide 
technical assistance and 
professional development 
and to help programs 
choose and modify 
curricula to meet 
students' needs. 
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Promising Practices Rating Scale (PPRS, Wisconsin Center 
for Education Research & Policy Studies Associates, 2005) 
and the Classroom Observation Protocol (Weiss et al., 
2003). PPRS is a general afterschool observation tool, while 
the Classroom Observation Protocol, originally designed 
for use in schools, provided a science-specific framework. 
This process led to further revisions of the DoS tool. 

Alignment with Nationally 
Recognized Frameworks 

Two recent documents were fundamental in shaping qual¬ 
ity indicators for OST STEM learning and accelerated the 
need for a quality assessment tool specific to this field. In 
2008, the National Science Foundation (NSF) developed 
Framework for Evaluating Impacts of Informal Science Edu¬ 
cation Projects (Friedman, 2008), which outlined the main 
areas in which OST STEM programs should be evaluated. 
Additionally, the National Research Council (NRC, 2009) 
introduced six strands that describe goals and practices 
for informal science learning. The NRC strands, like the 
NSF domains, offer a framework for designing quality 
STEM experiences in OST and for identifying possible 
outcomes. Specifically, the NRC framework highlights 
the importance of students’ excitement and interest; their 
ability to use models and build explanations, explore and 
test questions, reflect, and use scientific language and 
tools; and their ability to identify as people who can learn, 
use, and contribute to science (NRC, 2009). 

The NSF framework (Friedman, 2008) defines five 
impact categories for assessment: 

• Awareness, knowledge, or understanding of STEM 
concepts, processes, or careers 

• Engagement or interest in STEM concepts, processes, 
or careers 

• Attitude toward STEM-related topics or capabilities 

• Behaviors related to STEM concepts, processes, or careers 

• Skills based on STEM concepts, processes, or careers 

The researchers’ goal was to align DoS with the NSF 
framework and the NRC strands. At the time, three of the 
four DoS domains were Engagement or Interest, Content 
Knowledge & Competence and Reasoning, and Career 
Knowledge/Acquisition & Attitude/Behavior. All three do¬ 
mains are closely related to both the NSF framework and 
the NRC strands. As a result of numerous observations, the 
researchers felt the need for an additional domain to describe 
the curricula, materials, and space offered by afterschool 
programs, so they created a fourth domain, Programmatic 
Features. Over time, as researchers observed more STEM 
programs, dimensions within these domains were modified. 


Validation 

In order to make DoS available to a wide spectrum of OST 
programs, the development team needed to validate the 
tool by studying and reporting its psychometric prop¬ 
erties. To accomplish this goal, PEAR teamed up with 
Educational Testing Services (ETS) in 2010 under NSF’s 
Research and Evaluation on Education in Science and En¬ 
gineering program. A team of observers was trained to use 
DoS in more than 300 STEM programs across seven states. 

Teams of two trained observers, who had established 
initial inter-rater reliability with each other and with the 
pool of observers, observed STEM activities using DoS. 
These data were then analyzed to build a validity argu¬ 
ment for the tool. Specifically, developers looked at the 
distribution of scores for each dimension, the rater reli¬ 
ability of observers, and the average scores for each di¬ 
mension. They also looked for significant differences in 
scores from different kinds of programs—school-based, 
community-based, museum-sponsored, and so on. These 
details established the validity of the DoS tool; they are 
available in the NSF final technical report (Shah, Wylie, 
Gitomer, & Noam, 2013). 

The Final DoS Tool 

As illustrated in Figure 1, the current version of DoS has 
12 dimensions in four domains: Features of the Learning 
Environment, Activity Engagement, STEM Knowledge 
and Practices, and Youth Development in STEM. To¬ 
gether, the twelve dimensions capture key components 
of what makes a quality STEM activity in OST. 

The current DoS domains continue to be aligned wdth 
NSF categories and NRC strands, though they are arranged 
in different “bins.” For example, the NSF category Engage¬ 
ment and Interest is now covered by several DoS dimen¬ 
sions including Participation, Engagement with STEM, 
and Relevance. The NSF category Skills and Awareness, 
Knowledge, and Understanding is reflected in such DoS 
dimensions as STEM Content Learning, Inquiry, and Re¬ 
flection. Similarly, the DoS dimension Inquiry aligns with 
the NRC strand “Manipulate, test, explore, predict, ques¬ 
tion, observe, and make sense of the natural and physical 
world.” The DoS dimensions Relevance, Engagement with 
STEM, Relationships, and Youth Voice contribute toward 
NRC strand 1, “excitement, interest, and motivation.” The 
12 DoS dimensions work together to cover the range of 
outcomes in both the NRC and NSF frameworks. 

The DoS protocol consists of a short description of each 
dimension, a more elaborate description, commentary for 
training, and a four-point rubric. The description defines the 
dimension; the elaboration provides more details, presents 


Papazian, Noam, Shah, &> McCormick 


THE QUEST FOR QUALITY IN AFTERSCHOOL SCIENCE 19 


FEATUES OF THE 
LEARNING ENVIRONMENT 


ACTIVITY ENGAGEMENT 


STEM KNOWLEDGE 
AND PRACTICE 


YOUTH DEVELOPMENT 
IN STEM 


Organization 

Participation 


STEM Content 
Learning 


Relationships 







Materials 

Purposeful 

Activities 


Inquiry 


Relevance 







Space 

Utilization 

Engagement 
with STEM 


Reflection 


Youth Voice 


Figure I.The Final DoS Domains and Dimensions 


examples from the field, and provides tips on scenarios that 
commonly occur while observing STEM activities in OST. 
The commentary for training highlights key issues for train¬ 
ees as they learn how to use the tool. The summary of the 
rubric provides examples of numerical ratings on a scale of 1 
to 4, where 1 indicates little evidence and 4 indicates strong 
evidence of quality in that dimension. Each level is defined 
carefully in the rubric so that observers can distinguish the 
levels during their observation of an activity. The rubric for 
one dimension, Inquiry, is summarized in Figure 2. 

DoS Training 

To one observer, “inquiry” may mean “experiments,” 
while to another it may mean “rich discussions.” Sim¬ 
ply reading rubrics and watching science activities is not 
enough to make someone a proficient DoS observer. The 
text in the rubric helps to guide observers, but they need 
training to learn the meaning of each of the 12 dimen¬ 
sions and how to identify each of the four levels. 

DoS training familiarizes participants with the DoS 
tool and prepares them to conduct observations in the field. 
It also calibrates observers’ ratings so that the results are 
reliable and valid. The basic training consists of four steps: 

• Eight hours of content training, online or in person 

• Four to six practice observations in local afterschool 
STEM programs, in pairs 

• A one-hour online calibration session with PEAR 

• Certification for two years, with technical assistance 
and coaching as needed 

The training materials include case studies of real 


afterschool science programming, exercises asking ob¬ 
servers to critique evidence from real DoS observations 
in the field, and observation simulations using videos of 
science activities of various levels of quality. 

After completing all the parts of DoS training, new 
observers are certified for two years. One training fee 
covers all four steps, including continued coaching and 
technical assistance for two years to support the success¬ 
ful use of DoS in the field. Certified DoS observers can 
use the tool at no additional cost as frequently as needed 
to meet their program goals. 

Why Use DoS 

DoS can be used in flexible ways based on the needs of a 
program. Some reasons to use DoS include: 


HOWTO BE CERTIFIED 
AS A DOS TRAINER 


Researchers or practitioners can become 
certified DoS observers by completing the 
certification process outlined in this article. 
Contact PEAR to schedule a training. Training 
sessions are held year-round; the schedule can 
be adjusted to accommodate the needs of the 
organization. In-person trainings are great for 
large state networks or organizations looking 
to train their whole team, while online 
webinars accommodate participants from 
different locations. 
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EVIDENCE ABSENT 


INCONSISTENT EVIDENCE 


REASONABLE EVIDENCE 


COMPELLING EVIDENCE 


There is minimal 

evidence that students 
are engaging in STEM 
practices during activities. 

There is weak evidence 

that students are 
engaging in STEM 
practices during activities. 

There is clear evidence 

that students are 
engaging in some STEM 
practices during the 
activities. 

There is consistent 

evidence that students are 
engaging in a range of 
STEM practices during the 
activities. 

1 

2 

3 

4 

Students observe 
experiments 
demonstrations, or are 
given data, but do not 
participate in inquiry 
practices on their own. 

Students follow cookbook 
experiments where they 
are given step-by-step 
directions, or may be 
given data or facts instead 
of collecting them. Some 
aspects of the activity 
will support students 
engaging in STEM 
practices, but it is quite 
scripted or unnatural. 

Students engage in 

STEM practices; however, 
there may be uneven 
use of these practices by 
students or the level of 
support during inquiry 
may not be appropriate 
for the group of students. 

Students have multiple 
opportunities to ask 
questions; to think like 
scientists, mathematicians, 
and engineers; and to 
engage in STEM practices 
that allow them to 
investigate questions as 
they are appropriately 
guided by the facilitator. 


Figure 2. Summary of the DoS Rubric: Inquiry Dimension 


• To help individual programs track their progress over time 

• To encourage self-reflection among program staff, who 
can use DoS as a common language to discuss the quality 
of their activities and to pinpoint areas for improvement 

• To aggregate information across individual sites for 
large youth-serving organizations such as Ys or for city 
or state afterschool networks 

• To integrate DoS observations into an experimental 
evaluation design using pre- and post-participation 
assessments whose findings can be connected to the 
quality of the inputs observed using DoS 


likely to stay for at least a year. Despite the high turnover 
in afterschool settings, DoS can become an integral part 
of a program’s planning, monitoring, and evaluation pro¬ 
cess. Its dimensions and quality indicators can be passed 
on to new staff members as a common framework for dis¬ 
cussion when, for example, staff participate in curricu¬ 
lum design or undergo observation to help them improve 
their facilitation of activities. We are currently working 
on a train-the-trainer model so that program, curriculum, 
and training directors can begin to train their own staff 
and therefore make DoS an integral part of their program. 


DoS can be used to help identify the areas where 
professional development or coaching may be needed. 
It provides a common language that staff members can 
use as they reflect on the quality of their science activi¬ 
ties. Observers engage in consensus discussions in which 
they compare their field notes and ratings to make sure 
they have covered all aspects of the activities they ob¬ 
serve and that they leave no room for misinterpretation. 
They then use the results of that discussion to frame the 
feedback given to staff members to help them improve 
their activities. DoS scores, along with the ensuing dis¬ 
cussion and feedback, can help programs improve their 
curricular activities and pedagogical approaches. 

Because DoS training involves several steps, OST 
programs will benefit most if they send staff members or 
leaders who are committed to the organization and are 


A Case from the Field 

To illustrate how DoS can be applied in the field and to 
provide practical details on DoS training, we next de¬ 
scribe a case study of DoS observations conducted from 
summer 2009 to spring 2010 at East End House, a com¬ 
munity center in Cambridge, Massachusetts. At the time 
of the study, East End House served 100 youth, ages 
11-14, the majority of whom were eligible for free or 
reduced-price lunch. Approximately 60 percent of the 
participants were male and 40 percent female. The racial 
and ethnic composition of the student population was 35 
percent African American, 25 percent Caucasian, 20 per¬ 
cent Hispanic, 10 percent Asian, and 10 percent other. 

DoS was applied in this program both as a quality ob¬ 
servation tool to pinpoint strengths and weaknesses and as 
a professional development tool to help the staff plan and 
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revise their activities. PEAR trained the afterschool staff to 
use DoS, conducted observations of STEM activities, and 
provided feedback and recommendations that staff incor¬ 
porated into their STEM curricula. In total, 19 one-hour 
observations were conducted, eight before the afterschool 
staff received DoS training and 11 after the DoS training. 

The one-day training included information about 
existing quality frameworks in OST science, including 
the NSF framework and the NRC strands; approaches to 
quality in OST STEM instruction; and the development 
of DoS. More importantly, participants practiced apply¬ 
ing the tool by watching videos of STEM activities, rat¬ 
ing them, and reaching consensus on the ratings in small 
groups. Later, their ratings were calibrated with those of 
PEAR observers to establish inter-rater reliability with 
the tool’s developers. Inter-rater reliability was also es¬ 
tablished during practice field observations by compar¬ 
ing the ratings of pairs of newly trained observers. This 
process ensures that DoS is being used consistently and 
accurately, regardless of who the observer is. 

After the East End Efouse staff received training in 
DoS, PEAR and East End Efouse 
staff began conducting observa¬ 
tions together. PEAR observers 
were paired with the East End 
Efouse middle school program di¬ 
rector and the curriculum direc¬ 
tor. Each pair observed one activ¬ 
ity at a time and then discussed 
their ratings to reach consensus. 

The feedback was communicated 
to the facilitator of the observed activity. At the begin¬ 
ning of each curricular unit, the two directors worked 
with front-line staff to develop new curricula, incorpo¬ 
rating the findings from the DoS observations. PEAR also 
used observation data to recommend ways that East End 
Efouse could improve its programming. 

The observed curricula were developed by the after¬ 
school staff. Some examples of curriculum units included 
Numbers Behind Sports, Body Movement, Music by Me, 
and Green Thumbs Club. On average, the units were of¬ 
fered three times a week for four weeks. Typically one fa¬ 
cilitator taught each unit, while groups of students rotated, 
so that all students got through all of the available curricu¬ 
lar units during the academic term. During our study, four 
facilitators were teaching the units; there was no facilita¬ 
tor turnover. All facilitators had or were working toward 
bachelor’s degrees. Only one had a science background. 

Figure 3 compares the findings of the eight pre-training 
observations with those of the 11 post-training obser¬ 


vations, using the dimensions that comprised the DoS 
domains at the time of the East End House case study 
(See Figure 2 for the current domains and dimensions.) 
Post-training quality ratings for each of the 11 dimen¬ 
sions 1 increased relative to pre-training observations. The 
mean difference between pre-training and post-training 
scores was significant for nine dimensions: Planning and 
Preparation, Materials, Space, Engagement, Interest, Ex¬ 
ploration, Investigation, Broadening Perspective, and Rel¬ 
evance. The only dimensions that did not show signifi¬ 
cant gains were Content Learning and Structure. 

This case study suggests a correlation between use of 
the DoS tool and quality improvement. The study was not 
designed to confirm a causal relationship between the DoS 
training and an increase in quality of STEM programming. It 
used DoS as a formative instrument to help East End House 
improve its training and programming. A summative study, 
by contrast, would separate the external evaluators from 
the observers; the evaluators would analyze the observers’ 
data. Moreover, an experimental design with treatment and 
control groups would be the only way to establish a causal 
relationship between DoS and STEM 
quality Thus, this case study cannot 
pinpoint exactly what influenced the 
quality improvement. However, it 
does suggest that the focused training 
and feedback DoS provides were asso¬ 
ciated with positive trends in quality 
Staff interviews confirmed the 
importance of DoS to the STEM pro¬ 
gramming at East End House. DoS 
training enabled afterschool staff members to look at the 
program from an outsider’s perspective and to strive to 
achieve quality. They also became familiar with national 
frameworks of STEM quality assessment and with the 
dimensions of STEM quality in OST. In follow-up inter¬ 
views, front-line staff reported feeling more confident in 
their STEM teaching and in their understanding of what 
quality STEM activities look like. Staff members stated 
not only that the DoS training was important but also that 
actual use of the tool, with time for reflection and plan¬ 
ning, greatly enhanced their ability to develop and imple¬ 
ment quality STEM activities. One activity facilitator said: 
When we started doing science in our afterschool 
program, before being trained on DoS, we didn’t do 
inquiry; we didn’t know how to teach content. We 
did a lot of projects without a lot of depth. But now, 
we build lessons around student voice that engage 
kids in really understanding science, making mean¬ 
ing of their world, and using critical thinking skills. 


In follow-up interviews, 
front-line staff reported 
feeling more confident in 
their STEM teaching and in 
their understanding of 
what quality STEM 
activities look like. 
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Figure 3. Comparison of DoS Dimension Average Scores, Pre- and Post-Training 


The staff also reported that they were able to use their 
newly acquired skills to engage students in better science 
experiences. This improvement is exemplified in the words 
of another facilitator, who said, “One of the things we’re best 
at now is helping kids to make their own meaning and draw 
their own conclusions. Now they get to do the thinking.” 

Strengthening the Investment 
in Afterschool STEM Quality 

Our findings extend the creative work of a number of pro¬ 
gram assessment and quality observation tools by helping 
to define quality STEM education and enabling practitio¬ 
ners to observe STEM activities systematically. The DoS 
platform of observation and data-driven professional de¬ 
velopment can support programs to build practices that 
foster student interest and engagement in STEM. 

Findings from the case study of East End Efouse can be 
generalized to a wide spectrum of afterschool science pro¬ 
grams. Although each program is unique, the DOS tool is 
designed to help afterschool staff identify the strengths and 
weaknesses in their STEM instruction so that, through con¬ 
sensus discussions, they can work to improve the program. 

As a next step, the Mott Foundation, in collaboration 
with the Noyce Foundation, has created a technical as¬ 
sistance team to support nine state afterschool STEM net¬ 
works. In each state, we will train teams to use DoS and 
certify them when they have reached acceptable levels of 
reliability. This large-scale project has several components, 
including training professionals to use DoS and compar¬ 
ing DoS observation data with students’ expressions of sci¬ 
ence interest and facilitators’ self-reports on their science 


programming. We are also planning to give DoS training 
to afterschool providers across California. In the mean¬ 
time, the DoS tool has been adopted successfully in many 
afterschool networks. We have built the infrastructure to 
serve many regions and organizations across the country. 

We have collected valuable data describing quality 
across a range of sites and have seen improvement when 
OST staff systematically observe their STEM activities. 
Through continued analysis of the data, we are able to 
improve our training process and prepare observers to 
achieve the most accurate ratings possible. The practical 
feedback provided by certified observers can immediate¬ 
ly be used to improve OST STEM programming. 

Public and private funders are investing millions of 
dollars to get students interested and engaged in science 
outside of school. Use of DoS helps the OST field to dem¬ 
onstrate that the quality of our STEM instruction is strong 
and that it can lead to the student outcomes that funders, 
researchers, and practitioners alike are working to achieve. 
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Notes 

1 Data from the Belonging/Relationship dimension were 
removed from analysis. During development, the proto¬ 
col for this dimension was changed several times. This 
dimension therefore was not deemed consistent enough 
for the purpose of this paper. 
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